Sequence alignment and mutual information
نویسندگان
چکیده
Background: Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. All existing alignment algorithms rely on heuristic scoring schemes based on biological expertise. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure – the mutual information (MI) – previous attempts to connect sequence alignment and information theory have not produced realistic estimates for the MI from a given alignment. Results: Here we describe a simple and flexible approach to get robust estimates of MI from global alignments. For mammalian mitochondrial DNA, our approach gives pairwise MI estimates for commonly used global alignment algorithms that are strikingly close to estimates obtained by an entirely unrelated approach – concatenating and zipping the sequences. Conclusions: This remarkable consistency may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments. We expect that our approach can be extended to establish further connections between information theory and sequence alignment, including applications to local and multiple alignment procedures.
منابع مشابه
Microsoft Word - JBIO_Proteins Sequence Alignment
Abstract—Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. One of the important research topics of bioinformatics is the multiple proteins sequence alignment. Since the exact methods for MSA have exponential time complexity, the heuristic approaches and the progressive alignment are the most commonly used in multiple...
متن کاملInterMap3D: predicting and visualizing co-evolving protein residues
SUMMARY InterMap3D predicts co-evolving protein residues and plots them on the 3D protein structure. Starting with a single protein sequence, InterMap3D automatically finds a set of homologous sequences, generates an alignment and fetches the most similar 3D structure from the Protein Data Bank (PDB). It can also accept a user-generated alignment. Based on the alignment, co-evolving residues ar...
متن کاملSubstitution Matrices and Mutual Information Approaches to Modeling Evolution
Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classi cation are based on Blosum, Pam, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; rst we show how Bayesian and grid...
متن کاملMISTIC: mutual information server to infer coevolution
MISTIC (mutual information server to infer coevolution) is a web server for graphical representation of the information contained within a MSA (multiple sequence alignment) and a complete analysis tool for Mutual Information networks in protein families. The server outputs a graphical visualization of several information-related quantities using a circos representation. This provides an integra...
متن کاملInferring protein-DNA dependencies using motif alignments and mutual information
MOTIVATION Mutual information can be used to explore covarying positions in biological sequences. In the past, it has been successfully used to infer RNA secondary structure conformations from multiple sequence alignments. In this study, we show that the same principles allow the discovery of transcription factor amino acids that are coevolving with nucleotides in their DNA-binding targets. R...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008